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Abstract 

We prove the asymptotic independence of the empirical process a n = y/n{F n — F) 
and the rescaled empirical distribution function j3 n = rt(F n (r + -) — F n (r)), where 
F is an arbitrary cdf, differentiable at some point r, and F n the corresponding 
empricial cdf. This seems rather counterintuitive, since, for every n € IN, there is a 
deterministic correspondence between a n and /3 n . 

Precisely, we show that the pair (a n ,/3 n ) converges in law to a limit having inde¬ 
pendent components, namely a time-transformed Brownian bridge and a two-sided 
Poisson process. Since these processes have jumps, in particular if F itself has jumps, 
the Skorokhod product space D(IR) x Zf(R) is the adequate choice for modeling this 
convergence in. We develop a short convergence theory for D( R) x D(R) by estab¬ 
lishing the classical principle, devised by Yu. V. Prokhorov, that finite-dimensional 
convergence and tightness imply weak convergence. Several tightness criteria are 
given. Finally, the convergence of the pair (a n ,/3 n ) implies convergence of each of its 
components, thus, in passing, we provide a thorough proof of these known conver¬ 
gence results in a very general setting. In fact, the condition on F to be differentiable 
in at least one point is only required for f3 n to converge and can be further weakened. 

Keywords: Skorokhod topology, Brownian Bridge, Poisson process, tightness, hnite-dimen- 
sional distribution 


1 Introduction 

This paper brings together two important convergence results in empirical process theory. 
The first one is the convergence in law of the uniform empirical process (u.e.p.) y/n(G n (t) — 
t), t G [0,1], to the Brownian bridge. Here G n denotes the uniform empirical distribution 
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function (n.e.d.f). This result is originally due to M. D. Donsker |Don52j . who carried 
out an idea by J. L. Doob [Doo49] . The work was motivated by the pioneer papers of 
A. N. Kolmogorov |Kol33] and N. V. Smirnov [Smi.44] about the limit distribution of 
su Pte[o,i] | Vn(G n (t) - t)| and sup tG[0il] y/n(G n (t) - t), respectively. 

The other one is the convergence of the rescaled uniform empirical distribution function 
(r.u.e.d.f.) nG n (~), t > 0, to the Poisson process having intensity 1. Although being 
nowadays a standard exercise in empirical process theory, the origin of this result has 
remained, up to this day, unknown to us. It appears in different levels of generality e.g. in 
[KLS80| . |AHE84] or [HH88] . 

The Brownian bridge, closely linked to the Brownian motion, and the Poisson process 
are two fundamental stochastic processes, the relevance of which goes far beyond being 
limit processes in asymptotic statistics. The empirical distribution function and derived 
processes (such as the empirical process) are an important field of study in mathematical 
statistics. See for example [SW86 ] or [vdVW96] for a profound treatment of up-to-date 
empirical process theory with particular focus on statistical applications. 

The aim of this paper is to prove the asymptotic independence of the u.e.p. and the 
r.u.e.d.f., but we are going to do so in a general setting. Instead of being uniformly 
distributed, we let the underlying sequence of i.i.d. random variables {X n } be sampled 
from an arbitrary distribution function F. Then we look at the following generalizations 
of the u.e.p. and r.u.e.d.f., respectively, 

a„(t) = ^/n(F n (t) -F(t)), t G R, 

B F,T (f] = fn[F n (T+£)-F n (T)], 

\n[Fn(r+J)-F n (r-)], 

where r is an arbitrary real constant and 
1 n 

V n(t) = - 1 {X k <t}i t G R 

1 k= 1 

is the empirical cdf corresponding to F. The processes cqf and /3^ ,T both converge in law - 
to limits, say B\ and N 0 , respectively, that will be properly specified in section [2j We are 
going to show that also 

(at.ftTO (1) 

where Bi and N 0 are stochastically independent. 

At this point we would like to spare a few words about the implications of (JT|) . It is quite a 
remarkable result. The fact that the convergence extends from the individual sequences to 
the joint sequence is, although not to be taken for granted, hardly surprising. But B i and 
Nq in ([U) are independent, while af and f3^ T - since derived from the same sequence {X n } 


if t > 0, 
if t < 0, 
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- are clearly not. Consider a fixed t G R. One implication of (P) is that at^ft) and Pn’ T (t) 
are asymptotically independent. This may seem plausible, since they are deterministic 
transformations of F n (t) and F n (~), respectively, and it is known that the extreme and 
middle order statistics are asymptotically independent, cf. |Ros67^1 . But ((TJ) states even 
stronger that the whole processes are asymptotically independent - and that although, for 
any fixed n, a n and [3 n are linked via the strongest form of stochastic dependence there is: 
knowing one means knowing the other. 

When it comes to proving the result, the first question arising is: weak convergence in which 
measurable space? Since we canonically take Borel-cr-£elds, it comes down to choosing a 
topological space, which desirably is metrizable and separable. The processes involved 
have discontinuous paths and the whole real line as their time domain. Thus for example 
the nice, separable metric space (C[0,1], || • ||oo), the space of all continuous functions on 
[0,1], is not an option. But the trajectories of all processes are right-continuous, and the 
left-hand limits exist in all points, i.e. they are cadlag functions: “continue a droite, limites 
a gauche” (sometimes also roll). The space of all cadlag functions on the time domain T 
is usually denoted by D(T ). 

An element of D( R) stays bounded on a compact set, just as a continuous function does. 
Hence the sup-metric 11 • ||oo is a possible metric for D[0,1]. ft induces the topology of 
uniform convergence or short, the uniform topology. However, this metric is unsuitable for 
D[ 0,1], due to several reasons. First, (_D[0,1], || • ||oo) is not separable (see e.g. [ JS02| . page 
325). Second, and more severe, there are measurability problems. The empirical process 
is not measurable with respect to the uniform topology. In fact, Donsker’s original proof 
of the weak convergence of the u.e.p. was flawed, because he used this topology. 

In 1956, A. V. Skorokhod [SkooGl proposed several other topologies on D[0,1], of which 
the A-topology has become the most popular. It is coarser than the uniform topology, 
separable, metrizable and solves the measurability issue. It allows for a workable Arzela- 
Ascoli-type compactness characterization and it also declares a convergence more natural 
to functions with jumps. Nowadays, D[0,1] is by default equipped with the Ji-topology 
and simply referred to as the Skorokhod space. We endow D(R) with a proper extension 
of this Ji-topology (by the same means one declares a uniform topology for functions on 
the real line, cf. page ED and then treat the convergence statement (JT|) in the Skorokhod 
product space D(R) x D(R). The proof then breaks down into two tasks: Derive a weak 
convergence criterion in the space D( R) x D(R) (Theorem 15.2)1 and show that 
satisfies it (section ED . 

The standard method of proving weak convergence of stochastic processes is as follows: 
Prove the weak convergence of the finite-dimensional distributions, and show that the 
sequence is tight. The key argument here is Prokhorov’s theorem Pro56 . For example, 
this method is used to show that the partial sum process converges to the Brownian 
motion in (C[0,1], || • Hoc) (Donsker’s theorem jDon51j h The principle transfers with little 
alteration to _D[0,1] and D(R). We will show that it extends as well to Z?(R) x D{ R). It 

1 The authors thank E. Hausler for pointing out the reference. 
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is, however, only feasible, if the finite-dimensional distributions are known, and there are 
other approaches as well, see e.g. [JSP2j . 


The paper is organized as follows: Section [2] states the task in detail, the principal state¬ 
ment of this paper is formulated in Theorem 12.11 The predominant rest of the paper is 
devoted to its proof: Section [3] introduces the space D( R) and states the classic convergence 
criterion, section [4] deals with tightness in D( R). In section [5] we begin to develop a short 
weak convergence theory for the product space .D(R) x _D(R) and prove an analoguous 
convergence criterion. Finally we apply the latter to show Theorem 12.11 in section |HJ The 
paper ends with section 7 in which a short description is given how Theorem 2.1 can be 
used in statistics. 

We conclude the introduction with some remarks on the literature. Most of what we use 
are classical results being covered in a variety of textbooks. Our main reference is Patrick 
Billingsley’s “Convergence of Probability Measures” |Bil99] . This book’s first edition dates 
back to 1968 and features a stage-wise development from C[0,1] to D[ 0,1] to D[ 0, oo). A 
number of newer books, like e.g. | Pol84j . ( EK86 ]. |Whi02j and [ JS02J consider right away 
the space _D[0, oo), or a more general version of it, without paying extra attention to _D[0,1], 
and hence tend to be more profound. In particular, [ JS02J gives an exhaustive treatment 
of weak convergence on £)[0, oo). 

Besides [Sko56 j the other two important papers on D[0, 1] are Kolmogorov Kol56 and 
Prokhorov |Pro56j . The analogue on D[ 0, oo) is due to C. Stone jSto63j . Billingsley [Bil99j 
gives a construction of a complete metric on D[ 0, oo). He adopts a suggestion of T. Lindvall 
iLin73l . who in turn follows W. Whitt’s approach on C[ 0, oo), lWhi70| . Whitt also suggests 
another metric on D[0, oo), [Whi71l . 

2 Main result 

Let F be an arbitrary distribution function, and X\,X 2 , ... a sequence of i.i.d. random vari¬ 
ables being distributed according to F. The corresponding empirical distribution function 
(edf) is given by 

1 n 

= teR,n>l. 

1 k= 1 

The following family {a^\ n £ IN} of random functions is called the empirical process: 

1 n 

®n(t) = Vn(F n (t ) -F(t)) = ~ F(t)), t e R,n > 1. 
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Furthermore, for any real number r, let 


n[F n (r + £) - F n (r)], if t > 0, 
n[F n (r + £) - F n (r-)], if t < 0, 

Sfc=i l( T)r _|_hj if t > 0, 

ELi -i (r+ i, r) ™, iff <o. 


( 2 ) 


We want to call the family {/3^ ,T \ n G IN} the rescaled empirical distribution function. 
Whenever it is clear or not of interest which F and r are meant, we will shortly write a n 
and (3 n . In Theorem 12.11 we will make the following basic assumptions on F. 


Condition C.l F has both, left- and right-hand side, derivatives in r. Call the former Qi 
and the latter p 2 , i.e. 


Pi = lim 
h/'O 


F(t + h) - F(t—) 
h 


and 


p 2 = lim 
h \o 


F(t + h) - F(t ) 
h 


(3) 

(4) 


Pay attention to the r— in line (j3]). This definition of left-hand side derivative does not 
require F to be continuous in r. 

The next step is to specify the limit processes of {a n } and {/3 n }. Let B 0 = {B 0 (t)\ t G [0,1]} 
be a Brownian bridge and 

Bi = B[ = B 0 oF, i.e. B[(t) = B 0 (F(t )), t G R. 

B f is a Gaussian process with expectation zero and covariance function cov(s, t) = F(s)(l — 
F(t)) for s < t. Furthermore, let N\ , Ay be two Poisson processes with the following 
properties: 


• N\ and N 2 are independent of B i, and of each other. 

• N t has rate Qi, i — 1,2. 

• N 2 has, as usual, right-continuous trajectories while those of N\ are left-continuous, 
i.e. the value at a jump point is always set to the left-hand limit. Note that this 
leaves the finite-dimensional distributions unchanged. 

Then define N 0 = Nq 1,Q2 by 


N 0 {t) = 


t < 0, 
t > 0. 


-iVi(-t), 

N 2 (t), 
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All four stochastic processes we have introduced so far, a n , (3 n , Bi and N 0 , have trajectories 
in the Skorokhod space D = D(R), that is the space of all cadlag functions on R, equipped 
with the Skorokhod topology (A-topology). The Skorokhod space is properly introduced 
in section |3j Whenever we write D, the topological space is meant. This also applies to 
D x D (product topology). The Skorokhod space D is a Polish space. Its Borel-cr-field 
shall be denoted by S>. 

It is known and in the case of the uniform(0,l) distribution considered to be folklore that 

(A) aT B[ in D, and 

(B) if C.l is satisfied, _£> N euQ 2 in D 

Remark on (iBl) . The definition ([2]) of {/3 n } resembles a sequence of difference quotients. 
It is therefore not suprising that the derivative of F at r appears as parameter in the 
limit process of {/3 n }. But F does not have to be differentiable at r, it may have a sharp 
bend at r (i.e. left- and right-hand derivative differ) or even a jump (left- and right-hand 
limit differ). The behavior of the process (3 n on the positive half-axis is determined by the 
behavior of F in the right-hand vicinity of r, likewise for the negative half-axis. Thus the 
assumption of differentiability can be weakend to Condition C.l. 

In the proof of Proposition 16.41 it becomes clear that, instead of C.l, we only require the 
(formally) weaker condition that all of the limits 

lim 2 \F(t + -)- F(t~) 1 for t < 0 and lim *\F(t + *) ~ F(t) } for t > 0 

n— >oo 1 L n J n—>oo 1 L 71 J 

exist in order to have convergence of the process {/ 3 n }. However, it can be shown that for 
monotone functions F these limits coincide for either all t > 0 or all t < 0 and, even more, 
that the left- and right-hand derivatives exist at r. Finally it should be noted that C.l is 
a very weak condition - it does, for instance, not imply continuity in an upper or lower 
neighborhood of r. 

The following result is new. 

Theorem 2.1 Under Condition C.l converges in distribution to (B± , Nq 1 ’ 82 ) 

in D x D. 

Remarks. 

(I) Keep in mind that we have defined Nq to be independent of B i, which specifies the 
distribution of (Bi,Nq). The remarkable feature of Theorem 12.11 is not the conver¬ 
gence itself, but rather the fact, that the “highly dependent” a n and /3 n (knowing 
one means knowing the other) converge to independent limits. 

(II) Of course, (0) follows from Theorem 12.11 Note that - apart from the regularity 
condition C.l - F is completely arbitrary. The result does not seem to be contained 
as such in the literature. It should, however, be compared to Theorem 3.1. in [CH88 J. 


6 







The authors there consider processes of the type nF n (a n t + b n ), where {a n } and {b n } 
are sequences of real numbers such that the adjusted first order statistic (X 1:n — 
b n )/ct n converges to a non-degenerate limit. Such an extreme-value process may 
coincide with f3, f ,r if r is the left endpoint of the support of F and {b n } is constant 
equal to r. In this situation, our condition C.l with Q 2 > 0 implies the assumption 
on F in [CH88 J: F lies in the domain of attraction of the cdf L 2 ,i(x) = (1 — 
e _3: )l(o i oo)(^) (Weibull distribution with shape parameter 1), which is an extreme- 
value distribution of type 2, cf. e.g. |Gal78| . pp. 58, 76. 

(Ill) Convergence in law is canonically defined on a Borel-u-held, the underlying topo¬ 
logical space being here D x D. On the other hand, (a n , j3 n ) and (Bi,N 0 ) are pairs 
of random variables and hence defined on the product measure space, i.e. the cr-held 
Q) <g) 3). Fortunately, Ql ® *2) coincides with the Borel-cr-field on D x D (cf. Lemma 

E3D- 

The proof of Theorem 12. II is subject of section [6l 

3 Weak Convergence in D 

Preliminary note: Most textbooks and articles consider the space _D[0, 00 ) instead of D( R). 
Both spaces are qualitatively equal, all results for D[ 0, 00 ) hold with little notational change 
(which demands its due amount of care) also for D( R). Define 

D = D( R) = {x : R —> R | x(t—), x(t+) exist, x(t) = x(t+) V t E R}. 

Elements of D have at most countably many discontinuity points and are bounded on 
compact sets. We declare a topology on D by the following characterization of convergence. 
Let A denote the class of all strictly increasing, continuous, surjective mappings A from R 
onto itself. A sequence {x n } C D converges to x E D if and only if a sequence {A n } C A 
exists such that 

{ A n (t) —> t uniformly in t E R, 

x n (\ n (t)) —> x(t) uniformly in t E [—m,m\ for all m E IN. 

This is a D(R)-version of the Ji-topology, originated by A. V. Skorokhod [ Sko56| . This is 
the only topology we consider on D and subsequently refer to it as the Skorokhod topology. 
For details on different topologies on D see for example [ KY99 j|. Compare the above 
characterization to 1.14, page 328, in [JSQ2] . Note that, unlike in D[0,oo), the point 0 
must not play a special role in D( R). 

Since the identity is an element of A, uniform convergence on compact sets implies Sko¬ 
rokhod convergence. In fact, the Skorokhod topology is strictly coarser than the topology 
of locally uniform convergence ( uniform topology), i.e. there are fewer open sets and more 
convergent sequences. For instance, {l[i oo)| n E IN} convergences in the Skorokhod topol¬ 
ogy, but not in the uniform topology. As mentioned before, by writing D we always mean 
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the topological space. Let S> be its Borel-a-field. The topological space D is separa¬ 
ble (whereas the set D endowed with the uniform topology is not separable), completely 
metrizable and in this sense a Polish space. See e.g. [Bil99] for a complete metric. 

Let 7Tf denote the projection 717 : D —» R : x 1 —> x(t). For any probability measure P on 
(D, S>) let T P be the set of all points t G R for which n t is P-almost surely continuous. We 
call Px the distribution of any random variable A" in D and write Tx for Tp x . 

Lemma 3.1 The complement of Tx in R is at most countable. 

See e.g. pl99] . page 174. It follows, that Tx is dense. 

Definition 3.2 In D we say, the finite-dimensional distributions (fidis) of X n converge to 

& 

those of X, and write X n — > X, if 

{Xfip),...^^)) A (X(ti), ...,X(tk)) (6) 

for all k G IN and t\ < ... < tk G Tx- 

Remarks. 

(I) We restrict ti, ...,tk to lie in T x , because X n —> X in D does not necessarily imply 
n t (X n ) ir t (X), cf. e.g. [JS02 j. page 349, 3.14. This is due to the fact that n t 
(function from D to R) is continuous at a point x only if x (function from R to R) 
is continuous at t, cf. (Bil99] . page 134, Theorem 12.5 (i). Think, for instance, of 

l[l,oo) ~" l[0,oo), but 7Ti(l[i j00 )) 7Ti(l[0 iO o))- 

(II) 13.21 is equivalent to: there exists a dense subset S of R such that ([6]) holds for all 

_ 

finite subsets {£ 1 , ...,£&} of S, see |.TSD2| . page 350, 3.19. In this sense —> does not 
depend on its right-hand side. 

Now here is a characterizations of weak convergence in D. It is phrased in terms of 
random variables and convergence in law - which is equivalent to the weak convergence of 
the respective distributions. 

Proposition 3.3 Let { X n } be a sequence of random variables in (D, S>) with the following 
two properties: 

(1) {X n } is tight. 

(2) X n A X. 

Then X n -A X. 

Short, convergence of the fidis (in the sense of 13.21) and tightness together imply convergence 
in law. These two conditions are sufficient and necessary, cf. e.g. [JS02], page 350 or |Bil99j . 
page 139. 


















4 Tightness in D 

In order to make use of Proposition [3]3] we need a handy tightness criterion. Recall tightness 
of a sequence: A family £? of probability measures on the Borel-cr-held of a metric space 
is tight, if for every e > 0 there exists a compact set K such that P(K) > 1 — e for every 
P G A family of random variables is tight if the family of their respective distributions 
is tight. Prokhorov’s theorem tells us that in complete metric spaces, like D, tightness 
is equivalent to relative compactness. A family of 8? of probability measures is relatively 
compact, if every sequence in & contains a convergent subsequence. The limit needs not 
to lie in S?. 

We present three criteria which allow to confirm that a given sequence of random variables 
in D is tight. The first is, in fact, a characterization of tightness. 

4.1 A tightness characterization in D 

First we need to introduce some notation. We will be dealing with intervals of the type 
[—m,m], where m G IN. 

For an arbitrary function x : R —» R and an arbitrary set T C R we define 

w(x,T) = sup |x(s) — x(t)\. (7) 

s,teT 

We want to call any finite set a = {s 0 ,..., s*,} C R satisfying — m = s 0 < Si < ... < Sk = m 
a grid on [—m,m\. If 

Si — Sj_i > 5 for all i — 2,..., k — 1, 

i.e. all intervalls except those at the left and right end are wider than 5, we want to call 
the grid 5-sparse. Let S^{m,8) be the set of all 5-sparse grids on [— m,m\ and define the 
following modulus: 

w m (x,5) = inf max w(x, [sj_i, sd). (8) 

y{m,8)l<i<k 

Theorem 4.1 A sequence of random variables {X n } in (D, S>) is tight if and only if the 
following two conditions hold. 

(1) For all t in a dense subset T 0 of R, 

lim limsup P(|Wi(t) | > a) — 0, 

a—> oo n 

(2) and for every m G IN and e > 0, 

lim lim sup P(w m (X n , 6) > e) = 0. 

< 5—^0 n 

Proof, cf. |Bil99j . Theorem 16.8 in combination with the subsequent corollary. ■ 
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4.2 A moment-type tightness criterion 

Proposition 4.2 Let X and X n , n 6 IN, be random variables in (D, Suppose that 

(1) X n —> X , and 

(2) there exists a non-decreasing, continuous function H : R —» R and real numbers a > 1 
and b > 0, such that 

e(|A„(s) - X n (r)\ b \X n (t) - x n (s)| ft ) < (H(t) - H(r)) a . 

holds for all r < s < t, and n> 1. 

Then { X n } is tight. 

Proof. The D[0, 1] version of this proposition is Theorem 13.5 on page 142 in | Bil99) . 
The proof is also worked out in detail for the H(R)-case in |Vog05| . 

It needs to be shown that 14.21 (JTJ), (J2]) imply 14.11 (pQ), (J2J). In fact, 14.11 (HJ) follows from 14.21 
© already, and 14.11 (121) follows from 14.21 (12]). And of course, by Proposition 13.31 under the 

assumptions of Proposition 14.21 we have X n — > X in (D,@). • 


4.3 A point-process tightness criterion 


Let ST be the set of all non-decreasing series {t z \ z E Zj that meet the restrictions t z E 
[—oo, oo] for all z E Z, t 0 < 0 < t\, t z —* ±cx) as z —> ±cxd and {t 2 } is strictly increasing 
where it is not ±oo. Then define the following two classes of funtions, 


y+i _ 

y+ _ 


x : R —> R 


x : R —> R 


0 oo 

y: i(-oo,t z )+i(t z ,oo), c EZ,{t z } e sr |, 


% c y oo ,t z ) 

z=— OO Z=1 

0 oo 

% ^ ^ ^ ^1(— oo,t z ) ^ ^ Cz^-{t Zi oo); 

z=—oo z =1 


c E Z, { t z } E ST, c z E IN for all z E Z j. 


Apparently Y +1 C Y + C D. The set Y + allows also the following characterization: it 
contains all elements of D that are non-decreasing and integer-valued. Then, by employing 
(J5]), it is easy to see that the potential limit of any series {x n } in Y + has this property, 
too. Hence is closed in D and therefore measurable. As for Y +l , note that Skorokhod 
convergence x n —> x implies that for all t E R there exists a sequence {t n } C R such that 
t n -E t and 


Xnitn) ~ X n (t n ~) -> x(t)~x(t~), 

cf. [, TSn2j . page 337, 2.1. Hence, if {x n } C T' +1 , the limit x can only have jumps of size 
1 as well: T /+1 is closed in D. We want to call a random function whose paths lie almost 
surely in Y +l a counting process. 
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Proposition 4.3 Let X and X n , n G IN, be random variables in (D, @). Suppose that 

(1) x n A x, 

(2) P(X G r +1 ) = 1 and 

(3) P(X n G r+) = 1 ,ne IN. 

Then {X n } is tight. 

This is generalization of Theorem 3.37, page 354, in [JS02], Basically, [ JS02 ] consider the 
space D [0, oo) and require X n , n G IN, also to be counting processes in the above sense. A 
proposition of exactly the same type as ours (A" has jumps of size 1, X n has integer-valued 
jumps) can be found in l( '1188 . For the sake of completeness we present an alternative 
proof. 

Proof of Proposition I4.3L We apply, of course, Theorem 14.11 The implication 14.31 
(IT]) ==>• 14.11 (P) is straightforward, cf. e.g. proof of Theorem 13.3 in |Bil99j . We only derive 
condition 14.11 ([2]) here. 

Let m G IN and initially also S > 0 be fixed. Then choose a h-sparse grid a = {s 0 , 
on [—m, m] according to the following additional restrictions: 

Si - s^ < 26, i = l,...,k, and si,..., Sfc-i G T X - 


The latter is always possible, since Tx is dense in It, but so = — m G Tx or Sk = m G Tx 
does not need to hold. Now define the following two sets, 


A — |(ti, i) 

A a = lx g y + 


i = 2,...,k~2\cR k \ 


(x(si),..., x(s fc _i)) G A \ C y + . 


With these constructions the proof breaks down into two steps. First we show 
(a) lirnsup ¥(w m (X n , 5) > e) < P(A ^ A a ) for all positive e and then 

n 


(b) P(X £ A a ) ->■ 0 as 5 ->• 0. 

Part (a): By construction of A a we have x G A a =>■ w m (x,5) = 0, which implies 
P(A n ^ A a ) > P(w m (A n , 5)>e) V n G IN, e > 0. 

Since we have chosen s* G T x , i = 1,..., k — 1, 


(A n (s 1 ),...,A n (s Jfc _ 1 )) A ^A(s 1 ),...,A(s fc _ 1 )). 
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The set A is open in H k x , and hence by the Portmanteau theorem 

P(X ^ A a ) > limsupP(X n ^ A a ) > limsup ¥(w m (X n , 5) > e) V e > 0. 

n n 

Part (b): Define T z , z 6 Z, to be the jump times of X (we understand them as random 
variables in [—oo, oo]), where we count as follows: 

... < r_ 2 < T_i < T 0 < 0 < Ti < T 2 < ... 

Now consider the following events 

B n = [T_ n < -m, m < T n }, n G IN, 

C s ,n = {Ti — Ti_i >4S, i = —n + 1,..., 0,..., n}, n e IN, 5 > 0. 

It holds 


Ve>03nGlN : P (B n ) > 1 — 

(9) 

Ve>0,nGlN35>0 : P (Cs, n ) > 1 — ^ 

(10) 

{x g y +1 } nB„n Cs, n c {XeA a }. 

(11) 


Since P(A" G y +1 ) = 1, ([9]), (TTOh and (HIT) imply together 
Ve>03(5>0 : P(X ^ A a ) < e. 

It remains to show (J9J) and HUB- Both follow by the same principle from the fact that X 
is a counting process, which we will exemplify at (J9]). Assume the opposite is true: 

3e>0VnGl : P (B n ) < 1 — 

Since {B n \ n G IN} is an increasing series of sets, 

P(flCB„)>|, 

n 

where C means set complement. The event fj n C B n reads as: all T z , zGZ, lie in [— m, m]. 
By definition of the set ST this is a contradiction to X G Y +1 . • 
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5 Weak Convergence in D X D 

In Theorem 15.21 we will give a weak convergence characterization in D x D of the same 
type as Proposition 13.31 The set D x D is the collection of all pairs (x,y), where x,y E D. 
It is endowed with the product topology, i.e. 

{x n , y n ) -> (x,y) l (12) 

[Vn ->■ y. 

Again, by writing D x D we refer to the topological space. D x D is a Polish space. The 
following is important: 

Lemma 5.1 The Borel-a-field on D x D coincides with the product a-field *2> ® *3). 

Proof. See e.g. |Els02j . Theorem 5.10, page 115. Separability is needed. ■ 

Remark. One can identify the pair of functions (x, y) with the function f xy : R —» R 2 : 
t i— y (x(t),y(t)). If and only if x,y E D, then f xy is a cadlag function from R to R 2 . We 
want to call the space of such functions _D(R, R 2 ). The generalization is straightforward: 
The convergence characterization reads exactly as (J5]), only x n (X n (t)) and x(t) are R 2 - 
valued. In fact the co-domain can easily be replaced by any Polish space without having 
to change anything. 

If we identify (x,y) EE f XiV , the sets D x D and P(R, R 2 ) are equal, but the Skorokhod 
topology on P(R, R 2 ) is strictly finer than the product topology on D x D, i.e. it has 
less convergent sequences. Take, for instance, x n = In i, y n = lb i oc ' l . However, 
both topologies induce the same Borel-u-field, cf. |KY99j . In this paper we are not at all 
concerned with the space D(R, R 2 ). We deal with pairs of random variables and their 
convergence in law, which we want to be of the same type as (11211 . The product topology 
has to be our concern. 

Theorem 5.2 Let {Z n = (X n ,Y n )} be a sequence of random variables in (D x D, 

If 

(1) the sequences {X n } and {Y n } are tight, and 

(2) there is a random variable Z = (X,Y) in (D x D,3> ® S>) such that 

(AA(ii),..., X n (t k ), Y n (ti ),..., Y n (t k )) (X(H),..., X(t k ), y(H),..., Y(t k )) 
for all k E IN, ti, ...,t k E Tx D Ty, 
then Z n Z. 

The rest of the section is devoted to the proof of Theorem 15.21 It is more convenient to 
formulate the proof in terms of probability measures than random variables. Therefore, 
let P, P n , Pd), Pn \ P^ 2 ) and P« 2 ^ be the distributions of Z, Z n , X, X n , Y and Y n , n E IN, 
respectively. 

One thing to note about the theorem is that in 15.21 (JT]) we only require { X n } and {Y n } 
individually to be tight. This of course implies tightness of the joint sequence: 
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Lemma 5.3 If {X n } and { Y n } are tight sequences of random variables in ( D , S>), then 
{Z n = (X n , Y n )} is tight in (D x D , S> <S) S>). 

Proof, (a corollary of Tikhonov’s theorem) For any e > Owe find compact sets K 2 C 
D such that Pn\Kf) > 1 — | for all n e IN, i — 1,2. Then P n (Ki x K 2 ) > 1 — e for all 
n G IN, and K\ x K- 2 is compact in D x D by Tikhonov’s theorem. ■ 

We now introduce projections. Let T = {fy, ...,4} and S = {si,..., si}, where t\ < ... < tk 
and si < ... < si. Define 

tt t ■ D —» R k : x ...,x(t k )) = (tt tl (x), ...,n tk (x)) 


and 


tt s ,t : D x D R /+fc : (x,y) ^ (7r s (x), n T (y)) = (x(si), -, x(s { ), y(h), ...,y(t k )). 

Then 15.21 ()2j) can be written as 

P n O 7 P O 7 Tiffp V T C Tx fl Ty, \T\ < oo. 

Lemma 5.4 If T C Tx fl Ty, T finite, then ttt,t is P-a.e. continuous. 

Proof. Let A be the discontinuity set of 7Ty, i.e. the set of all points x G D in which i Ty 
is not continuous. The function tit.t is continuous at a point (x,y) G D x D if and only 
if 7 tt is continuous at x and y. Hence the discontinuity set of ttt,t is {A x D) U (D x A). 
Due to our assumption T C Tx D Ty we have 

P((A x D) U (D x H)) < P(A x D) + P(D x A) = P {1 \A) + P (2) (H) = 0. 

■ 

The proof of 15. 2 1 requires furthermore a few measure theoretical concepts. 

Definition 5.5 Let (Q, sY) be a measurable space. Any subclass AA of sY that satisfies 
h\y = v\y => h = v 

for any two probability measures /i and v on srf we want to call a separating class for sA. 

If /i and v differ, then 5? already suffices to separate them. Recall that, if a system of sets 
IZ C generates the cr-field sA and is closed under the formation of finite intersections 
(i.e. is a 7r-system), then 5? is a separating class for sY, cf. e.g. [Bil99] . page 9. 

Lemma 5.6 If IZ ( and TA 2 are separating classes for the a-fields and srf 2 , respectively, 
then so is ,IA\ x TA 2 for srf 2 . 
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Proof. We have to show that the two properties, 7r-system and generating class, extend 
from the marginals to the product. The former is apparent, for the latter see e.g. |Bau92] . 
Theorem 22.1, page 151. ■ 

For any T 0 C R let 

JT(T 0 ) = {nf\A)\A e ^(R |t| ),T c To, |T| < 00 } 

and 

je{T 0 ) = {7 t^ t (A)\A e ^(R 2|t| ) ,T c To, |T| < 00 }. 

J^"(T 0 ) and 3?(T 0 ) are subclasses of 3 and 3 ® 3, respectively, cf. |Bil99j . Theorem 16.6. 
Roughly, the next two lemmas tell that J^"(T 0 ) and 3f(T 0 ) are “large enough”, if T 0 is 
“large enough”. 

Lemma 5.7 //T 0 is dense in R, then J^"(T 0 ) is a separating class for 3. 

Proof. See [Bil99j . page 170, Theorem 16.6. ■ 

Lemma 5.8 //T 0 is dense in R, then 3f(T 0 ) is a separating class for 3 ® 3. 

Proof. By lemmas 15.61 and 15.71 ^(Tq) x J^(To) is a separating class for 3 <g) 3. It 
remains to see: ^(T 0 ) x J^"(T 0 ) C Towards this end we introduce the class 

Sf(T 0 ) = {n^ T (A)\A e ^(R |5|+|T| ), S,T c To, |S|, |T| < 00 }. 

Evidently ^(T 0 ) x ^(T 0 ) C £f(T 0 ). Furthermore it holds £f(T 0 ) = 3f(T 0 ). This is 
because any set 7 t^^(R) e 3f(T 0 ) can also be written as f° r an apropriate set 

C cR 2|TuS| . - 

This concludes the preliminaries, and we present the 

Proof of Theorem 15.21 We have, {Ti 1 ^} and {P,^} are both tight, hence {P n = 
{Pn\ Pn 2 ' 1 )} is tight (Lemma 15.3p . By Prokhorov’s theorem, {P n } is relatively compact. 
(Here it is important that 3 ® 3 coincides with the Borel-cr-field on D x D, cf. Lemma 
15.11 1 To each subsequence {P n ’} exists a further (sub-)subsequence { P n »} that converges, 
i.e. there is a probability measure Q = on (D x D,3 <g> 3) (which of course 

depends on {n"}), such that 

Pn" A Q- 

Lennnma 15.41 allows us to apply the CMT: 

P n „ o 7 iff T Q o nflj, for all hnite T C Tq(ij D Tq (2 ) . 
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On the other hand, 15.21 (121) implies 


P n » o ti t 1 t P o for all finite T C T C T P { i) fl T P &). 

This means, if we let T 0 = Tq{ i> fl Tq( 2 ) 0 T P ( i) ft T P ( 2 ) , then P and Q agree on the class 
^f(T 0 ). The set T 0 is dense in R (corollary of Lemma HTTP , thus JF(T 0 ) is a separating 
class for ® Q) (Lemma 15.81) . hence P = Q. 

Thus we know, all subsequences of {P n } contain a weakly convergent sub-subsequence, 
and all of these sub-subsequences converge to the same limit P. It follows: P n converges 
weakly to P, cf. [Bil99j . Theorem 2.6, page 20. ■ 


6 Proof of Theorem 12.11 

Proof of Theorem 12.11 By applying Theorem 15.21 the proof comes down to showing 

( A ) {«n } is tight, 

(B) {/ 6 ^’ T } is tight, and 

(C) the fidis of (a£, /3^ ,T ) converge to those of (B [, Nq 1,S2 ) in the sense of 15.21 f)2]). 

Part (B) is an immediate corollary of point-process tightness criterion 14.31 The result (A) 
does not follow equally straightforward: In a first step we use the moment-type criterion 
!4.2l to show that it holds for continuous F (Lemma 16.ip . and then, building on that, show it 
for arbitrary distribution functions F (Proposition 16. 3p . Part (JCJ) is subject of Proposition 

E3J ■ 

Lemma 6.1 If F is continuous, the series of random variables {a^} in (D,S>) is tight. 

Proof. Some straightforward calculations yield 

E(jct n (s) - a n (r)\ 2 \a n (t) - «n(s)| 2 ^ < 6 (F(t) - F(r)) 2 (13) 

for all r < s < t and all n G IN, cf. |Bil99j . page 150. Now we apply Proposition 14.21 
Condition 14.21 ([!]) is fullblled as a corrollary of Proposition 16.41 or as a simple exercise using 
the multivariate CLT. Condition 14.21 (J2J) follows from (Tl3]) with a = b = 2 and H = \fiiF. 


Lemma 6.2 Let F be an arbitrary cdf. The quantile transformation Q P : D[0,1] —¥ 
.D(R) : x H> x o F (i.e. Qf{x) is the function t t-)- x(F(t))) is Jf(B 0 )-a.e. continuous, 
where 2zf(-£> 0 ) denotes the distribution of the Brownian bridge B 0 . 
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Proof. We show: Qp is continuous as a function from (C[0,1],|| • ||) to D( R). This 
suffices because P(-B 0 G C[0,1]) = 1 and the Skorokhod topology coincides with the 
uniform topology (the one induced by the sup-norm || • ||) on C'[0,1], cf. [Bil99] . page 124. 
Now let x n —> x in (C[0,1], || • ||). Since the image of F is a subset of [0,1], this implies 

sup | x n (F(t )) - x(F(t))\ ->■ 0 
ten 

Hence ([5]) is satisfied with X n = id (the identy function on R) and we have Qp(x n ) —> Qf(x) 
in .D(R). ■ 

Proposition 6.3 The series of random variables {a^} in (D , S>') is tight for all distribu¬ 
tion functions F. 

Proof. The cdf of the uniform(0,1) distribution, which we want to call G, is continuous. 
Thus by 16.1116.41 and 13.31 we have 

in 

If we restrict these processes to the time domain [0,1], the convergence remains true, 
cf. [Bil99] . page 174, Theorem 16.7. Furthermore, for any cdf F, and Bf are their 
respective quantile transformations (cf. 16.211 w.r.t. F. Hence the previous lemma allows us 
to apply the CMT: 

—> Bf in 

A convergent sequence is tight. ■ 


Proposition 6.4 Let r e R and F be an arbitrary cdf such that Condition C.l is satisfied. 
Then 




holds true for all k e IN and t\, ...,tk £ R- 


A few remarks before we come to the proof: Showing a n - J —> Bi is a straightforward appli- 
cation of the multivariate CLT. The result /3 n —> N 0 is also easy to get using a Poisson- 
type limit theorem. Of course , it does not suffice to show these two statements separately. 
The two sequences are not independent of each other, and we need to show the finite¬ 
dimensional convergence of the joint sequence. Obviously neither of the approaches for the 
marginals works here. We prove Proposition 16.41 bv showing the pointwise convergence of 
the corresponding characteristic functions. This includes some lengthy calculations, so we 
restrict our demonstration to 

(a n (t),/3 n (t)) -A (.Bi(t), N 0 (t)), 
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and only for t > 0. The case t < 0 works just the same. The full-detail proof treating 
arbitrary tuples G R is written down in |Vog05| . For the characteristic function 

of (a n (t), /3 n (t)) we write ifn\ or short if n , and or -0 for the characteristic function of 
N 0 (t)). Since Bi(t) and N 0 (t) are independent, we can write down, if t > 0, 

0 {t) (x,y ) = exp { - ^F(t)( 1 - F(t))x 2 + g 2 t(e iy - 1)}. (14) 


If t < 0, then ipW(x,y) contains a term with 0 \ instead of g 2 - 


The next lemma specihes 


Lemma 6.5 Assume t > 0. 

(1) If t > t, and n is sufficiently large such that r + h <t, then the characteristic function 
: R 2 —> C of (3n ,T (t)) is given by 


ipn\x,y) = exp {-ixy/nF(t)} 

(2) If t < r, then : R 2 —» C is 
ifn\x,y) = exp{ -ixy/nF(t)} 


1 + F(t) (e k" - l) + (F(t + £) - F(r)) ( e iy - l)e ^ 


1 + F(t) (eV" - 1) + (F(r + £) - F(t)) (e iy - 1) 


Proof. Keep in mind that t > 0. Per definition of the characteristic function, 

1 


+ *2/ 


fc=i 


^n(x,y) = IE exp { ix —= (l(-oo,£] (X k ) - F(t)) 

-^ n k =i 

Since the Xk, k — 1, ...,n, are i.i.d., this transforms to 
— I ZX 

(0 n (x,r/))" = Eexp < —= (l ( _oo,i](^i) - F(t)) + iy l (TiT+ ±](Xi) 


Ehr.r+i]!^ 


(15) 


The right-hand side is the expectation over a function of the discrete random variable 
(1 (—oo.t] {X i), 1 ( TiT+ j.](Xl) ), the distribution of which we know. 


value of 

corresponding probability if 

'’m' 

i 

8 

5 

s 

y 

+ 

5 


t < T 

(0,0) 

1 - F(t) 

1 - F{t + 1) + F(t) - F(t) 

(0,1) 

0 

F(r + i)-F(T) 

(1,0) 

m - Hr + ±) + F(t) 

m 

(1,1) 

F(t + i) - F(t) 

0 
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Thus in both cases we can write down the expectation (1131) as a sum of three summands. 
Some re-grouping yields the expressions in Lemma 16.51 ■ 


Proof of Proposition 16.41 We show yjn > —» pointwise for all t > 0. We apply the 

following result from complex analysis. For complex numbers c and c„, n 6 IN, 

Cr 


1 + 


n 


Hence, it suffices to prove 


n (V’n (x,y) 


In ii(x,y). 


( 16 ) 

(17) 


Call the left-hand side h n and the right-hand side h. Consider at first the case 16.51 ([!]), i.e. 
t < t. Then by (fTD) and Lemma 16.51 

m} 

h = ~F(t)(l - F(t))x 2 + S2 t(e» - 1). 

We break the convergence h n —>■ h down into two parts: 


7 IX 

h n = n exp < - -= 

l yjn 


- n, 


(a) nexpj-^ F(t)} 


1 + F{t){e^ - l) 


— n 


-\F(t)( 1 - F(t))x 2 


(b) 7iexp{^(l-F(i))}[(F(r+i)-F(r)](e*»-l) —»■ 

For (a): Use the Taylor expansion of the exponential function. Bear in mind that it 
converges uniformly on any compact set. This allows us to write 

„2 77>/ + \2 


ixFlt )'1 ixF(t ) x 2 F(t ) 2 ,L , . 

-^ 2 } =i —2T- + °y ( "^ oo) 


and 


exp 


( ix '1 ix / 1 \ 

exp lu^) = + 


n —> oo . 


Plug this into the left-hand side, the rest is computing. 
Part (b) becomes apparent by noting 
t 


lim 


r + -) - F(r)) = q 2 (t> 0) 
n / 


and 


nmexp{i|( 1 -F( t ))} 


1. 


By adding (a) and (b) we have proved h n —> h. By (fT6l) this implies ipn\x,y) —> ^^(x, y), 
but so far only for r < t. As for r > t, the only difference is that in (b) the term 
exp{^=(l — F(t ))} is replaced by expj — ^=F(t)|, which of course converges to 1 as well. 
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7 Application in statistics 

In this short section we demonstrate at an example how Theorem 12.11 can be useful in 
statistics. Our arguments will only briefly be sketched. Consider i.i.d. random variables 
Ad, ...,X n , n G IN, with values in [0,1] and common cdf F = F Tj7 . Here, F T )7 is defined on 
[0,1] as the polygonal line through the points (0, 0), (r, 7) and (1,1), where the parameters 
r and 7 both lie in the open interval (0,1), and it is assumed that r / 7. Thus, r is 
the single point of discontinuity of the corresponding density. In this model Chernoff and 
Rubin [CR56] investigate the maximum likelihood estimator for r. An ad hoc estimator 
for the two-dimensional parameter (r, 7) is given by 

Tn = argmax | F n (t) - t\ and f n = F n (f n ). 
teR 

A key role in the analysis of the pair 
(n(r n -r), y/n{ 77 - 7 )) 

plays the observation that it has the same limit distribution as 

( ar S max { sign (7 - r) (t ) - t) }, a£(r) ). 

v ten ' 

Thus Theorem 12.11 and a (formal) application of the CMT yield convergence in distribution: 
( n(t n - r), y/n(% ~ 7)) B ), (18) 


where 


A = arg max { sign (7 — r) (N 0 (t) — t) } 
te R 

and B ~ N(0, F(r)(l — F(t))) are independent. The rates of A r o = Nq 1,S2 are given by 
Qi — l/ T an d Q 2 = (1 — O')/(I — r). In [Fer05j we give a representation of A in terms of 
arrival times of N 0 , which shows that with probability 1 the maximizing point A is uniquely 
determined. Moreover A is seen to have a continuous cdf. A rigorous proof of ( 1151 ) and 
further information will be published elsewhere. 
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